-
-
Notifications
You must be signed in to change notification settings - Fork 19
optimize seq-consuming library functions #1198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize seq-consuming library functions #1198
Conversation
19ad10e to
d4b1994
Compare
|
Hi @chrisrink10, I attempted to properly support the generator case in the I’m leaning toward another approach: explicitly dropping support for generators in What do you think? Thanks |
6675abd to
b22ad57
Compare
I thought the intent was to call |
After reconsidering the issue while working on it, my opinion has changed and I no longer believe that generators should be supported by
(defn empty?
"Returns true if coll has no items - same as (not (seq coll)).
Please use the idiom (seq x) rather than (not (empty? x))"
{:added "1.0"
:static true}
[coll] (not (seq coll)))
Please consider this a checkpoint for deciding how to proceed. Options:
I'm more in favor of (2) now. Let me know of your thoughts and insight how to proceed forward. Thanks |
|
https://clojure.org/reference/sequences says
From the above quotes, we can deduce that functions take host seq is not optimization. It is about immutability and structural sharing. Host I think clojure functions were designed to inter-operate with host collections seamlessly while making them as immutable and persistent as possible. Rich Hickey wants you to seamlessly interoperate with host collections. |
I think I still prefer (1), but I do believe that the idiomatic empty check issue is somewhat of a wrinkle. I think that could be solved by documentation. I would tend to agree with @amano-kenji (and Rich Hickey) that we should be allowing and interoperating freely with host collections, so I don't want to create a situation where calling |
I agree with the point of interoperating with host collections. However, I think when it comes to iterators (see generators are iterators), Clojure doesn't interop with them directly, calling For example, both user=> (import [java.util Iterator]
[java.util.stream Stream])
java.util.stream.Stream
user=> (defn count-iterator [max]
(let [state (atom 0)]
(reify Iterator
(hasNext [_]
(< @state max))
(next [_]
(let [val @state]
(swap! state inc)
val)))))
#'user/count-iterator
user=> (def it (count-iterator 3))
#'user/it
user=> (seq it)
Execution error (IllegalArgumentException) at user/eval141 (REPL:1).
Don't know how to create ISeq from: user$count_iterator$reify__138
user=> (iterator-seq it)
(0 1 2)
user=> (defn stream-count-iterator [max]
(-> (Stream/iterate 0 (reify java.util.function.UnaryOperator
(apply [_ n] (inc n))))
(.limit max)))
#'user/stream-count-iterator
user=> (def sit (stream-count-iterator 3))
#'user/sit
user=> (seq sit)
Execution error (IllegalArgumentException) at user/eval148 (REPL:1).
Don't know how to create ISeq from: java.util.stream.SliceOps$1
user=> (stream-seq! sit)
(0 1 2)thanks |
Python streamsOn JVM, java streams are a sequence of discrete objects which can be mapped to clojure sequence. Python streams are asynchnorous iterables suitable for
Python iteratorsGenerator is both On python and JVM, an On Difference between Python's Generators and Iterators, a comment says an Iterator is an Iterable which requires If a generator wasn't ConclusionEvery conceivable discrete sequence in python is The list of
You don't need |
But in the ;; Note this is not strictly necessary since keySet is a collection
;; implementing Iterable but it does show the usage.
user=> (iterator-seq (.iterator (.keySet (java.lang.System/getProperties))))Is the intent that something like this would fail too? user=> (seq (.keySet (java.lang.System/getProperties)))
("java.specification.version" "sun.jnu.encoding" ...) |
The HashMap keySet method just returns a Java
No, this simply extracts the keys as a However, note that user=> (def props (.keySet (java.lang.System/getProperties)))
#'user/props
user=> (seq props)
("java.specification.version" "sun.cpu.isalist" "sun.jnu.encoding" "java.class.path" "java.vm.vendor" "sun.arch.data.model" "user.variant" "java.vendor.url" "user.timezone" "clojure.basis" "java.vm.specification.version" "os.name" "sun.java.launcher" "user.country" "sun.boot.library.path" "sun.java.command" "jdk.debug" "sun.cpu.endian" "user.home" "user.language" "sun.stderr.encoding" "java.specification.vendor" "java.version.date" "java.home" "file.separator" "java.vm.compressedOopsMode" "line.separator" "sun.stdout.encoding" "java.vm.specification.vendor" "java.specification.name" "user.script" "sun.management.compiler" "java.runtime.version" "user.name" "path.separator" "os.version" "java.runtime.name" "file.encoding" "java.vm.name" "java.vendor.url.bug" "java.io.tmpdir" "java.version" "user.dir" "os.arch" "java.vm.specification.name" "sun.os.patch.level" "native.encoding" "java.library.path" "java.vm.info" "java.vendor" "java.vm.version" "sun.io.unicode.encoding" "java.class.version")
user=> (seq props)
("java.specification.version" "sun.cpu.isalist" "sun.jnu.encoding" "java.class.path" "java.vm.vendor" "sun.arch.data.model" "user.variant" "java.vendor.url" "user.timezone" "clojure.basis" "java.vm.specification.version" "os.name" "sun.java.launcher" "user.country" "sun.boot.library.path" "sun.java.command" "jdk.debug" "sun.cpu.endian" "user.home" "user.language" "sun.stderr.encoding" "java.specification.vendor" "java.version.date" "java.home" "file.separator" "java.vm.compressedOopsMode" "line.separator" "sun.stdout.encoding" "java.vm.specification.vendor" "java.specification.name" "user.script" "sun.management.compiler" "java.runtime.version" "user.name" "path.separator" "os.version" "java.runtime.name" "file.encoding" "java.vm.name" "java.vendor.url.bug" "java.io.tmpdir" "java.version" "user.dir" "os.arch" "java.vm.specification.name" "sun.os.patch.level" "native.encoding" "java.library.path" "java.vm.info" "java.vendor" "java.vm.version" "sun.io.unicode.encoding" "java.class.version")
user=> (def iprops (.iterator props))
#'user/iprops
user=> (iterator-seq iprops)
("java.specification.version" "sun.cpu.isalist" "sun.jnu.encoding" "java.class.path" "java.vm.vendor" "sun.arch.data.model" "user.variant" "java.vendor.url" "user.timezone" "clojure.basis" "java.vm.specification.version" "os.name" "sun.java.launcher" "user.country" "sun.boot.library.path" "sun.java.command" "jdk.debug" "sun.cpu.endian" "user.home" "user.language" "sun.stderr.encoding" "java.specification.vendor" "java.version.date" "java.home" "file.separator" "java.vm.compressedOopsMode" "line.separator" "sun.stdout.encoding" "java.vm.specification.vendor" "java.specification.name" "user.script" "sun.management.compiler" "java.runtime.version" "user.name" "path.separator" "os.version" "java.runtime.name" "file.encoding" "java.vm.name" "java.vendor.url.bug" "java.io.tmpdir" "java.version" "user.dir" "os.arch" "java.vm.specification.name" "sun.os.patch.level" "native.encoding" "java.library.path" "java.vm.info" "java.vendor" "java.vm.version" "sun.io.unicode.encoding" "java.class.version")
user=> (iterator-seq iprops) # the iterator was consumed
nil
user=> (seq props) # the original Set is still intact
("java.specification.version" "sun.cpu.isalist" "sun.jnu.encoding" "java.class.path" "java.vm.vendor" "sun.arch.data.model" "user.variant" "java.vendor.url" "user.timezone" "clojure.basis" "java.vm.specification.version" "os.name" "sun.java.launcher" "user.country" "sun.boot.library.path" "sun.java.command" "jdk.debug" "sun.cpu.endian" "user.home" "user.language" "sun.stderr.encoding" "java.specification.vendor" "java.version.date" "java.home" "file.separator" "java.vm.compressedOopsMode" "line.separator" "sun.stdout.encoding" "java.vm.specification.vendor" "java.specification.name" "user.script" "sun.management.compiler" "java.runtime.version" "user.name" "path.separator" "os.version" "java.runtime.name" "file.encoding" "java.vm.name" "java.vendor.url.bug" "java.io.tmpdir" "java.version" "user.dir" "os.arch" "java.vm.specification.name" "sun.os.patch.level" "native.encoding" "java.library.path" "java.vm.info" "java.vendor" "java.vm.version" "sun.io.unicode.encoding" "java.class.version")thanks |
Hi @amano-kenji, this reads like unedited AI-generated content and misses the core point if so. This isn't about implementing thanks |
|
I can write like chatbots because I talk with chatbots instead of humans most of the time. Chatbots tend to write in structured markdown format with headers and bullet points.
It took a while to wrap my head around what If we accept the theory that (defn seqable?
[x]
(if (isinstance x Iterator)
false
(isinstance x Iterable)))In python, iterator is iterable, so you need to first check whether |
|
@ikappaki I think there may be two separate issues being discussed here, so it may be useful for us to separate those out into two separate discussions. The first issue is that seq library functions don't coerce their inputs into seqs before operating on them. Clojure does this in most of it's seq library: (when-let [s (seq coll)]
;; ...
)This seems like a fairly uncontroversial change that we should make and merge. The second (and more contentious issue) is what happens when we attempt to call Perhaps we can make some progress on the former without blocking on the latter. Let me know what you think. |
|
In java, it seems you rarely encounter iterators instead of iterables. On the other hand, python generators seem common. Having to use |
Right, if I understood this correctly, this is what my argument is.
Yes, I advocate something along these lines.
Indeed, but allowing I consider supporting generators seamlessly with thanks |
Sure, would you mind opening a ticket for this please so I can commit the changes against it? I'm not entirely sure how to phrase it. This looks like a straightforward optimization with no functional changes to the current supported implementation. I believe what's in the MR now is sufficient, and we don't need any tests, since there's no change in functionality.
Unfortunately, I can't think of a way to make generators safely compatible with Please consider a couple of more example where using
> basilisp repl
basilisp.user=> (defn gen []
(dotimes [i 10]
(yield i)))
#'basilisp.user/gen
basilisp.user=> (def git (gen))
#'basilisp.user/git
basilisp.user=> (count git)
10
basilisp.user=> (count git)
0
Sure, as outlined above. thanks |
|
If we decide to support generators with a separate function, the function should be I'm not against But, I'm not the maintainer. Chris is. I read https://clojure.org/reference/sequences again. It says
Technically, |
|
Ok I think the more time I've spent thinking about it @ikappaki is right and we should just introduce
I'm not sure there's a clear solution for actual Python |
|
As far as I know, there are only iterators and iterables. A generator can be passed to a for statement because an iterator happens to also be an iterable that returns itself as the iterator. https://peps.python.org/pep-0255/ says
A generator function creates a generator which is a lazy iterator. It is a lazy iterator. If you call A generator function is a syntactic sugar for an iterator class. class Iterator
def __iter__(self):
return self
def __next__(self, ...): |
|
I believe I’ve come up with a plan that incorporates and addresses most points and concerns raised above. Basically, any iterable object can be coerced into a sequence using basilisp/src/basilisp/lang/seq.py Lines 221 to 233 in aa12514
However, this only works well in general if the input object is re-iterable—that is, if calling Consider the following simple example, where we check the number of items in the iterable, and if there’s more than one, we return the first item: $ basilisp repl
basilisp.user=> (defn count-first [input]
(when (> (count input) 1)
(first input)))
#'basilisp.user/count-first
basilisp.user=> (count-first #py [1 2 3])
1This works fine with Python’s built-in collection types, such as lists and tuples, since calling basilisp.user=> (import os)
nil
basilisp.user=> (count (os/scandir))
18
basilisp.user=> (first (os/scandir))
<DirEntry 'LICENSE'>
basilisp.user=> (count-first (os/scandir))
nilTherefore, single-use iterables should not be implicitly coerced into sequences, as this can lead to subtle and hard-to-diagnose bugs. Fortunately, there’s a way to detect and exclude these single-use iterables from implicit coercion. As @amano-kenji alluded, we can check whether Based on that, I plan to revise the def sequence(s: Iterable[T], fail_single_use: bool = True) -> ISeq[T]:
"""Create a Sequence from Iterable s."""
i = iter(s)
if fail_single_use and i is s:
raise TypeError(f"Single-use iterator type detected, please use iterator-seq instead :type {type(s)}")
def _next_elem() -> ISeq[T]:
try:
e = next(i)
except StopIteration:
return EMPTY
else:
return Cons(e, LazySeq(_next_elem))
return LazySeq(_next_elem)
def iterator_sequence(s: Iterable[T]) -> ISeq[T]:
"""Create a Sequence from Iterable s."""
return sequence(s, fail_single_use=False)I’ve already reviewed the Basilisp test suite and found at least one case where single-use iterators were likely dropping elements during iteration. I’ll open a separate PR to address that issue by promoting the fix described above. This PR will remain focused on optimizing the standard library by reducing unnecessary Please let me know your thoughts. Thanks |
|
@ikappaki that seems reasonable to me. Thanks! |
|
Hmm? Is |
I find the object comparison method to be more in line with my analysis: I'm looking to identify Iterable objects that are single-use iterables, i.e., objects that return themselves when I can imagine that one could write a custom class with these two methods, but the |
|
Okay. I think we have finally discovered a good approach. But, how will you implement Is something like this not going to be a bit inefficient? (defn seqable?
[x]
(and (isinstance x Iterable)
(not= x (iter x))))I don't know how often |
|
This is another implementation. (defn seqable?
[x]
(try
(not= x (iter x))
(catch TypeError _ false))) |
|
Both of the above suggestions are on the right track in my opinion. |
Hi, could you please consider patch to remove explicit support of single-use iterables in sequences. It fixes #1192. single-use iterables now throw a `TypeError`, when they are implicitly or explicitly coerced into a sequence. A new `iterator-seq` is introduced to explicitly convert them into a seq instead . See #1198 (comment) for the rationale. The `count` fn was also updated to behave the same. There were a few places where single use iterators were converted implicitly to a seq, which are now explicitly converted using a `iterator-seq` or the python equivalent instead. Tests are added for the same, as well as documentation section under Python Iterators. Thanks --------- Co-authored-by: ikappaki <[email protected]>
b22ad57 to
176baca
Compare
176baca to
b064f93
Compare
|
Hi @chrisrink10 , as discussed above, I've repurposed this PR as an optimisation patch to coerce key seq-consuming functions inputs to a sequence up front. It addresses #1234. There are no functional changes, thus the existing test coverage should be enough. Can you please review. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me. Had one minor question, but it doesn't materially change my opinion.
| (apply map f (rest coll) (map rest colls))))))) | ||
| (when-let [coll (seq coll)] | ||
| (let [colls (map seq colls)] | ||
| (when (every? some? colls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why some? rather than identity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
I was reasoning at the semantics level, just checking that every value wasn't nil, so I used (every? some?). I hadn't considered using identity, which relies on knowing that (map seq colls) returns either a seq or nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I missed the fact that the PR was already merged 😅

Update1: this PR was repurposed to focus on optimizing to seq-consuming library functions to coerce their inputs into seqs before operating on them. It fixes #1234.
Support generators in seq-consuming functions via the
when-let [coll (seq coll)]idiom where applicable. Fixes #1192